Nature Biomedical Engineering
○ Springer Science and Business Media LLC
Preprints posted in the last 7 days, ranked by how well they match Nature Biomedical Engineering's content profile, based on 42 papers previously published here. The average preprint has a 0.07% match score for this journal, so anything above that is already an above-average fit.
Zhang, C.; Chen, Y.-L.; Jamilov, A.; Liu, E.; Shree, S.; Lam, B. D.; Foy, B. H.
Show abstract
Most routine clinical markers are interpreted using population-based reference intervals, despite being regulated around patient-specific homeostatic setpoints. This mismatch obscures physiologic shifts, inhibiting detection of early disease signatures. Here, we develop a novel Bayesian inference method that adaptively constructs personalized reference intervals using each patients existing health records. In analysis of >100 million lab tests in >800,000 patients, these personalized intervals can be accurately constructed with only minimal prior data, meaning this method can be applied near universally. We show that across 43 common lab markers, patient setpoints are strongly associated with future morbidity, with signal strength increasing as more test data is collected. Deviation from personalized reference intervals provides strong and novel risk signatures across diverse disease states, including hypothyroidism, hematologic cancers, kidney disease, and pregnancy complications. Importantly, personalized reference intervals capture a different risk signature to existing population-based approaches, with the highest risk patients being those who deviate from both intervals simultaneously. In a targeted clinical use case study of iron infusion, use of personalized reference intervals greatly improved prediction of treatment efficacy and allowed precise tracking of treatment responses. Our results illustrate how existing health records can be used to construct personalized benchmarks for nearly all common clinical tests, driving a new paradigm for precision laboratory medicine.
Elemento, O.; Sigaras, A.; Colonel, J.; Hajirasouliha, I.; Ghosh, S.; Bensoussan, Y.; Bridge2AI-Voice Consortium, ; Rameau, A.
Show abstract
Vocal biomarkers, encompassing voice and speech, have largely been developed for individual conditions in isolation, limiting their generalizability across diseases and recording settings. To address this, we introduce VoiceFM, a contrastive model that learns general-purpose clinical voice representations by aligning audio embeddings with rich clinical metadata. Using the Bridge2AI-Voice dataset (984 primarily English-speaking adult participants, 846 used for training and 138 held out as a temporally separated validation cohort, 40,056 recordings totaling 176 hours across 5 academic medical centers), VoiceFM pairs a fine-tuned Whisper large-v2 encoder with a tabular transformer over 44 clinical features via symmetric InfoNCE loss. Linear probes on frozen VoiceFM embeddings achieve mean AUROC 0.952 +/- 0.005 across five evaluation tasks (control vs disease screening plus four disease categories), significantly outperforming Frozen Whisper (0.926 +/- 0.013, p = 0.013), Frozen HuBERT (0.885 +/- 0.017, p = 0.0009), and the contrastively trained VoiceFM-HuBERT (0.938 +/- 0.006, p = 0.012). On the 138-participant held-out cohort, VoiceFM-Whisper achieves AUROCs of 0.99 for Alzheimer's/dementia/MCI and 0.89 for airway stenosis, demonstrating that the learned representations generalize to participants the model has never seen. VoiceFM representations transfer to three external datasets without retraining and improve few-shot classification. Recording task attribution identifies a small set of speech tasks that match or exceed the full battery's performance, suggesting shorter screening protocols are feasible. Trained predominantly on English audio, VoiceFM transfers without fine-tuning to Spanish-language Parkinson's disease (PD) detection (NeuroVoz, 107 participants, AUROC 0.93 +/- 0.02), with the signal dominated by articulatory rather than phonatory features. A fine-tuned classifier achieves participant-level AUROC 0.87 (sustained 0.85, countdown 0.80) on the mPower smartphone study (585 held-out participants). Together, these results show that contrastive alignment between voice and rich clinical metadata can serve as the basis for a clinical voice foundation model, producing a single set of transferable representations that generalize across diseases, languages, recording conditions, and patients enrolled after model freeze.
Lu, S.; Ruan, X.; Wang, L.; Wang, X.; Sameer, M.; Liu, H.
Show abstract
Although GLP1/GIP receptor agonists demonstrate unprecedented weight loss efficacy, their rapid clinical adoption has revealed significant real-world tolerability challenges. To evaluate their dynamic safety profiles, we developed a macro to micro pharmacovigilance framework by combining global FAERS reports with local UT Physician EHR. Macroscopically, we distilled 17 shared adverse events across the drug class from FAERS with disproportionality analysis. Microscopically, local EHR data (289,655 longitudinal treatment sessions across 71,316 patients) revealed 51.6% of GLP1 sessions terminated within 90 days. Furthermore, temporal stratified logistic regression demonstrated that initial exposure (0 to 30 days) correlated strongly with nausea and vomiting, which attenuated in extended sessions, whereas extended exposure (>2 years) uncovered late onset risks, notably incident hepatic steatosis. Ultimately, this time aware framework reveals that GLP1 safety profiles are profoundly duration dependent, providing critical insights into both acute intolerances and long-term medication safety.
Shah, M.
Show abstract
Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease affecting more than 450,000 individuals worldwide and is frequently diagnosed more than 12 months after symptom onset, delaying intervention during a critical early window. Because up to 80% of patients develop dysarthria within two years, subtle changes in speech provide a signal of early bulbar motor neuron degeneration. However, existing speech-based systems rely on supervised classification trained on limited datasets, achieving moderate sensitivity and depending heavily on labeled disease examples, which restrict scalability and early detection. This study introduces SPEAK-NORM, the first-ever normative speech modeling framework for early ALS diagnosis, which learns age- and sex-conditioned motor-speech distributions exclusively from healthy individuals. A conditional variational autoencoder models coordination of hypoglossal, laryngeal, and respiratory motor pathways, and deviation from this healthy manifold is quantified through latent representations and reconstruction error to form a 354-dimensional profile. A calibrated linear Support Vector Machine performs subject-level classification under subject-disjoint validation. On the VOC-ALS database (n = 153), SPEAK-NORM achieves 98% accuracy with balanced sensitivity and specificity, significantly outperforming established clinical acoustic indices and prior systems. The framework maintains strong performance under cross-task generalization and when retrained on healthy controls in independent dementia and Parkinson disease cohorts, demonstrating disease-specific deviation patterns rather than generic neurodegenerative change. Spectral, temporal, and latent separations further support interpretability. By modeling healthy speech instead of memorizing disease examples, SPEAK-NORM enables scalable early neuromotor screening using recording devices, with potential to support earlier diagnosis, differential classification, and monitoring of ALS progression.
Poulakis, K.; Ioannou, K.; Bezgin, G.; Chiotis, K.; Iturria-Medina, Y.
Show abstract
Can we decode Alzheimers disease (AD) heterogeneity into a few portable axes that capture how amyloid-{beta}, tau and neurodegeneration (A-T-N) spatially co vary in vivo? To answer this question, we built a pipeline that harmonizes longitudinal amyloid-{beta}/tau PET and T1 MRI (gray matter) from ADNI cohort (12,430 images) with mixed effects modeling and then derived stage specific multimodal axes (mVCs) using linked component analysis, with robustness tested in simulations and external validation in the OASIS cohort (4,958 images). We identified a small set of multimodal axes that (i) recapitulate early tau weighted variation in cognitively unimpaired (CU) individuals, AD like A-T-N coupling in cognitively impaired (CI) individuals and atypical CU and CI participants with posterior (precuneus/occipitoparietal) and fronto insular/frontal weighted patterns, (ii) map onto domain specific cognition, APOE e4, and blood/CSF biomarkers of neurodegeneration, neuroaxonal injury and astrocyte activation, (iii) predict clinical transitions, (iv) generalize in an independent cohort, and (v) demonstrate modelling robustness to missing data, high dimensionality, and cross-cohort variability, enabling direct application of the extracted axes to new datasets for biomarker discovery and stratification. Multimodal axes provide a portable, interpretable layer for quantifying amyloid-{beta}-tau-neurodegeneration coupling at the individual level, complementing current biomarker-based staging frameworks based on A-T-N status and tau PET topography, and can be computed on new datasets to aid clinical assessment and trial enrichment.
Cavon, J.; Perez, C.; Quinn-Bohmann, N.; Magis, A. T.; Gibbons, S. M.
Show abstract
Emerging evidence links the gut microbiome to sleep quality, yet measuring sleep at scale remains challenging. Commercial wearables, such as Fitbit, capture objective sleep and activity data in naturalistic settings. We integrated Fitbit data from a large, deeply-phenotyped cohort with paired lifestyle and health questionnaires. Wearable-derived measures aligned well with self-reported sleep, activity, and happiness. We identified dozens of covariate-adjusted associations between Fitbit-derived sleep features, lifestyle factors, and multi-omic data. Among molecular feature sets, the gut microbiome showed the greatest number of associations with sleep quality: butyrate-producing genera were positively associated with sleep and amplified the benefits of physical activity. Oscillospira, in particular, was consistently associated with better sleep. In blood, insulin, omega-3, and cortisol correlated with poorer sleep, whereas lower alcohol intake and mineral supplements correlated with better sleep. These robust, covariate-adjusted findings advance mechanistic understanding of the gut-sleep axis and broader molecular and lifestyle determinants of sleep quality.
Yang, Y.; Peracchio, L.; Mayourian, J.; Miller, T.; La Cava, W.
Show abstract
Background Artificial intelligence-enhanced electrocardiography (AI-ECG) enables scalable, low-cost cardiac dysfunction screening, but existing models are annotation-intensive and predominantly adult-derived, leaving paediatric generalizability uncertain. Paediatric cohorts exhibit highly variable cardiac morphology and function compared to adults, which may be useful for learning generalizable AI-ECG models. Methods We pretrained ECG-Fyler on a predominantly paediatric, all-age cohort at Boston Children's Hospital (1992-2023), annotated with a cardiology-specific coding system (Fyler codes), and evaluated it on assessments from echocardiography (echo) and cardiac magnetic resonance (CMR) studies. We validated on an external adult cohort from Columbia University Irving Medical Center. Performance was benchmarked against several AI-ECG foundation models by AUROC across age groups, lesion types, and limited-data scenarios. Findings The pretraining cohort comprised 782,138 ECGs from 255,271 patients (median age: 10.9 years, IQR: [2.8-16.8]). Internal evaluation included 178,495 ECG-echo pairs (median age: 10.9 [3.7-17.0]) and 8,584 ECG-CMR pairs (median age: 20.7 [15.6-29.6]). External validation included 82,543 ECG-echo pairs from adults (median age: 64.0 [52.0-74.0]). ECG-Fyler improved AUROC across biventricular dysfunction and dilation tasks, with the largest gains in low-data settings. In internal validation, ECG-Fyler detected low left ventricular ejection fraction (LVEF [≤] 40%) from only 100 fine-tuning samples (AUROC: 0.80, 95% CI: [0.78-0.80]), outperforming other models (AUROC < 0.65) and improving with additional fine-tuning (AUROC: 0.94 [0.93-0.94]). Similar improvements were observed for CMR-derived LVEF, RVEF, and ventricular dilation. In external validation on adults, ECG-Fyler exhibited an AUROC of 0.83 (CI: [0.82-0.85]) for LVEF [≤] 40%. After fine-tuning on less than 10% of external data, LVEF [≤] 45% performance (AUROC: 0.87 [0.86-0.88]) outperformed a fully trained, site-specific prior model (AUROC: 0.85 [0.84-0.87]). Interpretation Pretraining on richly annotated, paediatric-dominant ECGs yields models that transfer efficiently across institutions and ages, supporting AI-ECG screening and triage when labels or imaging access are limited. Funding National Institutes of Health (R01LM012973); Kostin Innovation Fund, Boston Children's Hospital
Nag, S.; Banerjee, S.; Banerjee, S.; Ghosh, S.; Bera, A.; Shanmugam, S.; Mondal, A.; Chakraborty, S.
Show abstract
Tuberculosis (TB) remains one of the deadliest infectious diseases, with over a million deaths annually and a growing threat from multidrug-resistant strains (MDR-TB). A major bottleneck in controlling TB is the lack of truly portable, rapid, and user-friendly diagnostic systems that can operate effectively in decentralized, resource-constrained settings. Here, we present a first-of-its-kind, portable nucleic-acid-based diagnostic platform that enables both primary TB screening and detection of drug resistance within the same unified framework, without any change in the operative embodiment. The system integrates loop-mediated isothermal amplification (LAMP) targeting dual Mycobacterium tuberculosis markers (IS6110 and IS1081) with a compact, AI-enabled device and smartphone-based readout, delivering rapid and reliable results at the point-of-care. Clinical evaluation across 105 samples demonstrated high sensitivity and specificity. Further validation through real-world deployment in a primary healthcare setting, using a single-gene (IS6110) configuration operated by minimally trained personnel, yielded 95.60% sensitivity and 100% specificity, benchmarked against GeneXpert. Critically, the same platform architecture, without modification, extends seamlessly to drug-resistance profiling, demonstrated here through a probe-free, allele-specific LAMP approach for identifying key mutations associated with rifampicin (rpoB) and isoniazid (katG) resistance. By combining robust molecular diagnostics with AI-driven automation in a compact and accessible format, this work represents a significant medical advancement toward democratizing TB care. The platform thus holds strong potential to enable early screening, guide timely treatment decisions, reduce transmission, and substantially strengthen global TB elimination efforts, particularly in high-burden, low-resource settings.
Napier, A.; Wiley, J.; Heslin, M.
Show abstract
A closed-loop quality system deployed across thirteen US hospital sites resolved physician complaints with zero regressions on 42 tracked cases across 1,089 optimization iterations, while a deterministic assembly-agent replacement cut H+P trace latency from 19.6 s to 10.8 s (-8.8 s, 95% CI [-10.5, -7.1] s; n = 100 pre, n = 100 post). We report four observations and an architectural follow-through. First, the same binary-check instrument produces opposite outcomes depending on the question asked: "maximize this score" produces structurally-correct notes that physicians reject (Spearman rho = -0.077, 95% CI [-0.40, 0.26], n = 36); "did this specific fabrication stop?" produces rater-invariant deployment decisions. Second, in our pipeline, assembly-stage agents did not respond to prompt optimization the way reasoning agents did: four consecutive optimization attempts produced 18-28 point regressions. Third, physician preference is rater-fragile at typical clinical-AI calibration sample sizes (Cohen's kappa = 0.028 between two board-certified physicians, 95% CI [-0.30, 0.36] on n = 35 overlapping pairs). Fourth, the architectural punchline: six weeks after the prediction, the LLM call at the chart-assembly step was replaced with a deterministic renderer (sub-500-character template plus sandboxed scripting), lifting the defect-free rate on a 51-case holdout from 49% to 84%. We introduce a Pareto-with-absolute-floors acceptance rule (multi-axis commit with severity-class categorical vetoes) as a methodological contribution distinct from scalar-reward acceptance in standard prompt-optimization frameworks. Cross-iteration rejection memory prevents the loop from re-proposing edits already rejected three or more times. A reproducibility bundle (anonymized ablation per-case counts, bootstrap-CI data, analysis scripts) is released under CC BY 4.0 at github.com/sayvant/SQS-Auditor-paper-data.
Zhao, J.; Ahmadi, S.-A.; Decker, J.; Zwergal, A.; Eulenburg, P. z.; Flanagin, V. L.; Wuehr, M.
Show abstract
Quantitative eye movement analysis is important for neuro- logical diagnostics, yet existing video-oculography (VOG) systems typ- ically require calibration, device-specific settings, or accurate gaze la- bels. We present VOGeo-Gaze, a real-time, calibration-free, geometry- aware neural network that estimates gaze by reconstructing anatomi- cally meaningful eyeball parameters from image features. The method combines segmentation-driven projection geometry, a refraction-aware pupil correction module, and temporal anatomical stabilization, so gaze is derived from interpretable eye geometry rather than direct angular regression. Trained only on the public TEyeD dataset with weak gaze supervision, VOGeo-Gaze was evaluated on 116 clinical recordings from 17 patients and 19 healthy subjects using EyeSeeCam, a clinical gold- standard VOG system. It achieved median absolute angular errors of 0.33{whitebullet} horizontally and 0.35{whitebullet} vertically, with nearly 92% of recordings below 1{whitebullet} error while operating at >300 FPS. These results demonstrate sub-degree clinical gaze estimation without subject-specific calibration, camera intrinsics, or accurate gaze labels, providing a scalable and inter- pretable alternative to conventional VOG pipelines. Code is available at https://github.com/DSGZ-MotionLab/VOGeo-Gaze.
Minoccheri, C.; Joo, P.; Hu, X.-S.; Affendi, H.; Elayyan, F.; Harville, A.; McDonald, N. J.; Botero, T.; DaSilva, A. F.
Show abstract
Neuroimaging based pain decoding faces two underappreciated challenges: between subject variability that prevents classifiers from generalizing across patients, and within session cross validation designs that inflate reported accuracy by conflating within person and between person variance. Here we address both using portable functional near infrared spectroscopy (fNIRS) during pharmacologically verified local nerve anesthesia. Twentyfive patients with clinically painful teeth underwent 36 channel bilateral fNIRS during percussion before ("Pre") and after ("Post") local nerve anesthesia. In 13 block-success patients, a paired Pre versus Post comparison with healthy tooth control identified three temporal hemodynamic response function (HRF) features (late slope, mean first derivative, and baseline normalized amplitude) whose analgesia interaction effects (d = 0.63 to 0.79) exceeded that of raw general linear model (GLM) amplitude (d = 0.56), with a significant difference-in-differences interaction (p = 0.011). Per-patient calibration with these features yielded leave one subject out (LOSO) AUC = 0.68 to 0.76 for nonlinear classifiers (permutation p = 0.002), with HbO-specific feature selection achieving the best performance (RF AUC = 0.760); a healthy tooth negative control was non-significant. End to end deep learning on raw time series (CNN LSTM AUC = 0.719) was competitive with feature based classifiers, while linear models did not reach significance. Critically, head to head comparison of within-session CV and LOSO on the same data revealed mean inflation of +0.13 AUC across all model types, including deep learning, demonstrating that high within session accuracy alone does not establish subject-independent validity. Exploratory analyses suggested complementary roles for oxyhemoglobin (HbO; within patient analgesia detection) and deoxyhemoglobin (HbR; cross patient information), and that trial to trial response variability may complement amplitude for cross patient pain detection. These results show that per patient calibration with temporal HRF features supports subject independent analgesic-state detection under strict LOSO evaluation, and that within-session validation (standard in the fNIRS pain- decoding literature) can substantially overestimate performance.
Yun, Y.; Hao, X.; Zhang, Y. D.
Show abstract
Quantifying uncertainty in polygenic score (PGS)-based phenotype prediction is crucial for the integration of genomic data into precision medicine. While the PGS provides a fundamental pivot for point estimation, clinical decision-making necessitates the construction of well-calibrated prediction intervals that reliably encompass the true phenotypic values. However, phenotypic residuals are frequently characterized by complex heteroscedasticity and stratified variance structures across diverse demographic contexts. Existing approaches often rely on global calibration mechanisms, which fail to account for such localized variance structures and lead to systematic miscalibration within specific subpopulations. To bridge this gap, we propose Clustering-based Split Conformal Prediction with Normalized Residuals (C-SCNR), a versatile framework based on Split Conformal Prediction. By adopting residual normalization and incorporating a repetitive `split-and-cluster` mechanism, C-SCNR dynamically identifies latent error strata and applies fine-grained adjustments to the resulting intervals. Our framework requires no distributional assumptions regarding the phenotype, is compatible with any PGS method, and flexibly accommodates biologically-informed grouping. Simulation studies demonstrate that our framework consistently outperforms existing methods across diverse error distributions. In real-data applications analyzing Body mass index (BMI), Low-density lipoprotein (LDL) cholesterol, and High-density lipoprotein (HDL) cholesterol in the UK Biobank, C-SCNR effectively resolves the coverage deficiencies of existing methods in specific subgroups and consistently yields superior localized calibration. Overall, C-SCNR represents a flexible and powerful framework for constructing high-resolution context-specific prediction intervals, thereby facilitating more reliable clinical interpretations of polygenic risk.
Kurt, F.; Subasi, S. N.; Yakisan, E. S.; Subasi, A.
Show abstract
Background: Wearable technologies enable scalable and continuous monitoring of emotional states through passive sensing of physiological and behavioral signals. However, conventional learning approaches often struggle to model the complex temporal, contextual, and relational dependencies underlying human emotions. To address these limitations, we propose a graph-based framework that represents multimodal wearable observations as heterogeneous knowledge graphs enriched with semantic information derived from Large Language Models (LLMs), enabling richer contextual understanding beyond raw sensor measurements. Methods: We constructed a heterogeneous knowledge graph using multimodal Fitbit physiological signals and affective self-report data collected from 45 users. Framing mood prediction and emotion detection was formulated as both binary and ternary node classification tasks. We evaluated five baseline heterogeneous Graph Neural Network (GNN) architectures and compared them with the proposed Semantically Gated Augmented Graph Neural Network (SeGA-GNN) framework, which dynamically integrates LLM-generated semantic embeddings into graph representations through a gated cross-modal fusion mechanism. Results: The baseline GNN models achieved strong performance, with classification accuracies ranging from 0.7525 to 0.9739 for binary classification and 0.6249 to 0.9699 for ternary classification. The proposed SeGA framework consistently improved predictive performance across most architectures. In particular, semantic augmentation transformed the HAN model from moderate baseline performance into near-perfect emotion recognition capability, achieving SeGA-HAN Accuracy = 0.9988 and AUC = 1.0000 for binary classification and Accuracy = 0.9979 and AUC = 1.0000 for ternary classification. Discussion and Conclusion: Integrating LLM-derived semantic contextualization into heterogeneous graph learning enables effective modeling of contextual information that is not directly captured by wearable physiological signals alone. The proposed SeGA-GNN framework demonstrates that adaptive semantic fusion substantially improves the accuracy, robustness, and interpretability of wearable-based emotion detection. These findings establish a promising direction for next-generation wearable affective computing systems and intelligent emotion-aware applications.
Hameed, S.; Henry, K.; Jiang, F.; Bhusal, B.; Dillenbeck, H.; Gakenheimer-Smith, L.; Webster, G.; Golestani Rad, L.
Show abstract
Pediatric patients with cardiac implantable electronic devices (CIEDs) face limited MRI access due to RF-induced heating, and computational modeling is increasingly used to characterize this risk. The validity of these simulations, however, depends on pairing body models with clinically realistic lead configurations, guidance that is currently lacking. We retrospectively analyzed 302 CIED surgeries in 281 pediatric patients to derive weight-based constraints for simulation design. Weight alone discriminated epicardial from endocardial lead implantation with AUC = 0.90, and adding age and height yielded no improvement, supporting weight as a sufficient single-parameter selection metric. The probabilistic crossover between approaches occurred at 44~kg, substantially higher than the 10 to 15~kg threshold commonly cited in the literature, with a broad transition zone of 21 to 66~kg in which both lead types were routinely used. Lead length was likewise weight-constrained: only 25~cm leads were observed in patients below 6~kg, and leads of 45~cm or longer were uncommon below 50~kg. These findings yield a three-tier framework, with epicardial-only configurations below 21~kg, dual configurations within 21 to 66~kg, and weight-thresholded lead lengths throughout, enabling MRI safety simulations to focus on clinically realizable anatomy and device combinations.
Su, C.-Y.; Butler-Laporte, G.
Show abstract
Yang et al. recently published a systematic comparison of genetic effects on disease susceptibility and disease-specific mortality across nine common diseases and seven biobanks, concluding that susceptibility and survival architectures overlap only modestly. This is an important resource, but we argue that the current mortality genome-wide association studies (GWAS) require explicit power calibration before limited overlap can be interpreted biologically. Using two-sample Mendelian randomization (MR) with positive-control exposures, we show that even a well-powered positive control, body mass index (BMI), instrumented by 855 genome-wide-significant variants, produces a clearly detectable effect for heart failure (HF) mortality, with only weaker evidence for chronic kidney disease (CKD) mortality. However, when BMI instruments were stratified into quartiles by exposure-association strength, the heart failure association remained nominally significant only in the two strongest quartiles and was not significant in the two weakest quartiles. Further, using household income as a weakly instrumented socio-economic contrast has insufficient power to detect moderate effects on any disease mortality outcome. These analyses indicate that current disease mortality GWAS may be insufficiently powered to detect shared effects. In contrast, the same BMI instrument set produced large and directionally coherent effects when applied to case-control GWAS of the matched six diseases, with the HF and prostate cancer associations preserved under a within-family BMI sensitivity analysis, and nominal support for CKD. The HF mortality association was also preserved in a within-family BMI sensitivity analysis. Similarly, genetically proxied household income was associated with HF risk in the case-control GWAS despite null associations with disease-specific mortality, consistent with limited power in the mortality GWAS. These findings indicate that the limited BMI-mortality evidence across several outcomes is unlikely to reflect a weak BMI instrument or dynastic artefacts alone and instead supports limited effective power in current disease-mortality GWAS.
Vanegas Mueller, E.; Joe-Oshodi, A.; Banerjee, A.; Villarroel, M.
Show abstract
Cardiovascular disease is the leading cause of death worldwide. Sudden cardiac death (SCD) accounts for roughly 50% of all cardiac deaths. The electrocardiogram (ECG) is widely used for early diagnosis of cardiac disease. However, the complexity of accurate interpretation limits the ECG's efficacy. Modern deep learning methods have been applied to assist clinicians in diagnosis. We applied Neural Architecture Search (NAS), an automated machine learning technique, to identify optimal deep learning architectures for classifying cardiac arrhythmias from ECGs. We applied the Differentiable Architecture Search strategy to an AutoFormer search space to identify optimal self-attention architectures for arrhythmia classification. We trained, validated, and tested the resulting model on the PhysioNet Challenge 2021 dataset (n = 88,253), comprising ECGs across three continents. We performed a hyperparameter optimisation on the NAS output, exploring input patch size, class weighting, and loss function. We evaluated performance using the PhysioNet Challenge metric and the area under the receiver operating characteristic curve (AUROC). The NAS converged towards minimal architectural configurations (embedding dimension: 384, depth: 4, self-attention heads: 4, MLP ratio: 1) with a validation challenge metric of 0.66 (PhysioNet Challenge 21 Winner: 0.63). The NAS-created network achieved an AUROC of 0.97 and a challenge metric of 0.71 during testing. Normal Sinus Rhythm and Sinus Tachycardia achieved AUROCs of 0.99. Low-QRS Voltage and T-wave abnormality were the worst-performing arrhythmias, with AUROCs of 0.89 and 0.90, respectively. We interpret that architectural simplicity drives performance in arrhythmia classification. Because SCD is unexpected, prevention strategies in free-living environments require lightweight computational resources suitable for wearable devices. Class imbalance fundamentally limits classification performance for rare arrhythmias such as Low-QRS Voltage and T-wave inversion, irrespective of hyperparameter choices. However, the self-attention mechanism can autonomously abstract clinical representations, simplifying clinical deployment by eliminating the need for an explicit feature-extraction pipeline.
Ahmed, Z.; Govindareddy, P.; DeGroat, W.; Narayanan, R.; Peker, E.; Zeeshan, S.
Show abstract
Precision medicine aims to advance our ability from a "one-size-fits-all" approach to personalized and predictive healthcare across diverse populations. It promotes integration of multi-omics and phenotypic data to understand disease mechanisms and discover novel biomarkers and risk factors, which could be used to predict and prevent critical diseases in individual patients across diverse populations. The potential implications of precision medicine approach can accelerate our ability to classify patients at higher risk of developing critical diseases, improve diagnostic capabilities, develop deeper understanding of individual risk, investigate racial differences and demographic characteristics, and find relationships between genetic variants, expressions, and diseases. This study focuses on implementing an innovative and data driven framework of translational bioinformatics and Machine Learning (ML) techniques to analyze multi-omics, including RNA-seq and Whole-Genome Sequencing (WGS) data, generated using blood samples of randomly consented patients. First, we utilized bioinformatics pipelines to identify differentially expressed genes and their pathogenic and likely pathogenic variants for the downstream data analysis, annotation, and visualization. Then, applied a nexus of ML models for multi-omics biomarker discovery, disease prediction, density-based clustering, single-patient profiling, and pathogenicity classification. WGS data analysis supported the exploration of genetic variation and diversity among patients to identify known and novel biomarkers, whereas RNA-seq data analysis improved our understanding of functional and biological pathways that underlying disease states. We classified and clustered pathogenic variants and expressions across various genes and discovered numerous diseases leading risk factors. Our results include gene-disease associations and captured common pathways across the broader population, demonstrating a level of sensitivity and accuracy that has broad clinical implications. We validated our results through clinical records, and state of the science literature. This study delves into the strengths of multi-omics data integration and capabilities of ML application in genetically diverse and complex patient cohorts. Our approach has the potential to elucidate complex gene-disease interactions for genetically diverse populations, which can support earlier diagnoses for patients in many disease realms.
Fayette, L.; Brendel, K.; Mentre, F.
Show abstract
Joint modelling of longitudinal data using non-linear mixed effects models and time-to-event outcomes provides a suitable framework to account for informative censoring when estimating biomarker dynamics and quantifying event risk using covariates and longitudinal trajectories. Their usefulness in clinical research depends on data collection design, particularly to precisely estimate the association (link) parameter between longitudinal and survival processes. However, optimal design strategies have so far been addressed separately for longitudinal and survival endpoints and remain unexplored for joint models. We propose two Fisher Information Matrix (FIM) computation methods for joint models, relying on Monte-Carlo integration over observations combined with either Markov Chains Monte-Carlo or Adaptive Gaussian Quadrature to integrate random effects. Their accuracy is assessed against clinical trial simulations in an oncological example based on the HORIZON III study with a tumour-growth-survival model including discrete and continuous covariates. We apply these methods to quantify the impact of follow-up duration, sampling richness, sample size, and covariate distribution on parameter uncertainty and test power. In our example, longitudinal-parameter uncertainty is barely affected by follow-up duration or sampling richness, whereas survival-parameter uncertainty decreases substantially from 1-year to 2-year follow-up. The number of subjects needed (NSN) to achieve <15\% uncertainty on the link parameter is comparable for a 2-year rich design and a 3-year sparse design. Optimal covariate distributions are stable across designs and systematically improve test power, outperforming longer and richer but non-optimised designs. These FIM-based methods accurately predict uncertainty and test powers, enabling design evaluation and NSN computation for joint-model-based clinical studies.
Faghih, M.; Damm, M.; Kassik, M.-T.; Cheesman, L.; Rauschenberg, S.; Olesen, S. S.; Laheru, D. A.; Zheng, L.; Phillips, A. E.; Yadav, D.; Drewes, A. M.; Rosendahl, J.; Singh, V. K.; International Pancreatic Pain Consortium,
Show abstract
Pain in pancreatic ductal adenocarcinoma (PDAC) is associated with poor survival, but whether altered pain processing carries prognostic significance is unknown. We analyzed a prospective cohort of 143 patients with PDAC who underwent pancreatic quantitative sensory testing (PQST) after diagnosis. Patients were classified as having normal pain processing (n=84), segmental hyperalgesia (n=30), or widespread hyperalgesia (n=29). Survival was measured from the date of P-QST assessment. During follow-up, 70 deaths occurred. Widespread hyperalgesia was associated with increased mortality in unadjusted Cox analysis (HR 1.96, 95% CI 1.14,3.35) and after adjustment for age, sex, tumor stage, comorbidity, opioid treatment, and body mass index (adjusted HR 2.33, 95% CI 1.30,4.15). Segmental hyperalgesia was not associated with mortality. Kaplan Meier analysis demonstrated lower survival probability in the widespread hyperalgesia group (log rank p=0.025). These findings suggest that widespread hyperalgesia, reflecting altered central pain processing, identifies a subgroup of PDAC patients at increased risk of mortality independent of conventional clinical factors.
Hsiao, C.; Cheng, Y.-R.; Yang, C.-Y.; Hsu, F.-S.
Show abstract
Subjective auditory-perceptual evaluation and uninterpretable deep learning models limit the clinical assessment of voice disorders. This study proposes a two-phase zero-shot framework to evaluate voice pathology. First, an Audio Spectrogram Transformer is fine-tuned on the Perceptual Voice Quality Database to generate an acoustic latent space. Second, Orthogonal Procrustes analysis maps these acoustic embeddings directly onto the semantic space of a pre-trained Sentence Transformer. The geometric alignment produced continuous semantic axes that outperformed a supervised machine learning baseline in regressing clinician-rated GRBAS (Grade, Roughness, Breathiness, Asthenia, and Strain) severity scales. Furthermore, these axes correlate with traditional acoustic measures, including Harmonics-to-Noise Ratio and local jitter, while remaining robust when applied to aperiodic signals by not requiring fundamental frequency extraction. Most importantly, the model achieved zero-shot semantic expansion, successfully evaluating voices using an untrained, natural clinical vocabulary beyond the GRBAS scale. External validation on the Voice ICarus Database confirmed cross-corpus stability and demonstrated the capacity for zero-shot differential phenotyping of specific etiologies, such as hypokinetic dysphonia and reflux laryngitis. By bridging acoustic and semantic latent spaces, this framework offers an objective, continuous, and transparent metric for evaluating voice quality using voice descriptive vocabulary.